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THE  USE  OF  DYNAMIC  PROGRAMMING  IN 
AN  OCCUPATIONAL  ENVIRONMENTAL  PROBLEM 

INTRODUCTION 

The  National  Institute  for  Occupational  Safety  and  Health  (NIOSH)  pro¬ 
posal  for  a  sampling  strategy  can  be  used  to  satisfy  an  employer's  objective 
to  provide  a  work  environment  which  can  attain  95%  confidence  that  no  more 
than  5%  of  employee  exposure  days  are  over  the  permissible  exposure  limit  (3, 
p.  29).  The  sampling  strategy  was  developed  using  a  particular  stochastic 
model  for  the  concentration  measurements  (3,  p.  17).  In  this  report,  a  proce¬ 
dure  for  finding  an  optimal  sampling  strategy  is  presented,  using  the  tech¬ 
nique  of  dynamic  programming  (DP).  The  employer's  objective  function  used  in 
this  report  Is  not  that  stated  above,  but  Instead  Is  based  on  cost  criteria. 
However,  It  will  be  shown  that  If  an  optimal  sampling  strategy  does  not 
satisfy  the  requirements  that  no  more  than  a  certain  fraction  of  employee 
exposure  days  are  over  the  permissible  exposure  limit,  this  requirement  can  be 
used  as  a  constraint,  and  a  nonoptima  1  sampling  strategy  can  be  used  Instead. 

To  apply  DP,  it  was  necessary  to  a)  represent  the  concentration  levels  by 
a  stochastic  model,  and  b)  introduce  a  cost  structure  for  the  employer. 
Samples  taken  of  the  concentration  level  (and  perhaps  of  other  related  varia¬ 
bles)  should  be  used  to  Identify  a  stochastic  model,  to  estimate  its  para¬ 
meters,  and  to  help  make  decisions  concerning  control  of  the  employer's  pro¬ 
cess.  The  first  two  uses  for  the  samples  are  considered  to  be  of  a  statisti¬ 
cal  nature  and  will  not  be  discussed  In  this  report.  Thus,  It  Is  assumed  that 
enough  samples  have  been  taken  so  that  a  stochastic  model  of  the  concentration 
Is  known,  and  values  for  Its  parameters  have  been  determined.  The  only  use  to 
be  made  of  the  sampled  data  will  be  to  make  optimal  decisions  to  minimize  the 


cost  of  the  process.  To  define  a  cost  structure,  consider  the  employer  s 
process  to  be  operating  under  "steatfy  state"  conditions  and  consider  a  fixed 
interval  of  time  (such  as  a  day).  The  following  costs  are  defined: 

Ci  Is  the  cost  of  running  the  process  (i.e.,  the  cost  of  production) 
over  the  Interval  of  time. 

c2  Is  the  cost  of  making  a  measurement  of  the  concentration  level  during 
the  interval  of  time. 

C3  Is  the  cost  of  not  being  able  to  use  the  process  in  a  productive  way 
over  the  Interval  of  time. 

c4  is  the  cost  of  exceeding  the  permissible  exposure  limit  over  the 
interval  of  time. 

It  is  assumed  that  a)  c4  cannot  be  assessed  at  the  end  of  the  time  inter¬ 
val  unless  a  measurement  has  been  made  during  the  time  interval,  b)  the 
process  can  be  carried  out  purely  for  the  purpose  of  making  a  measurement, 
with  no  employees  subject  to  exposure  and  no  penalty  cost  involved,  and  c)  c3 
is  greater  than  Ci . 

Three  decisions  can  be  made  to  control  the  process: 

Decision  1.  The  process  will  be  ongoing  during  the  next  time  interval, 
but  no  measurement  is  made.  The  cost  involved  is  equal  to 
Cl- 

Decision  2.  The  process  will  be  ongoing  during  the  next  time  interval, 
and  a  measurement  will  be  made.  The  expected  cost  is  equal 
to  Ci  +  c2  +  c4  P  (A), 

where  A  is  the  event  that  the  concentration  level  exceeds  the  permissible 


Decision  3.  The  process  will  be  carried  out  solely  for  the  purpose  of 
making  a  measurement.  The  expected  cost  is  equal  to  c2  + 

c3* 

While  the  above  assumptions  may  be  considered  simple,  they  were  chosen 
to  illustrate  the  use  of  DP  as  an  approach  to  the  problem  of  developing 
sampling  strategies.  The  statistical  considerations  can  also  be  incorporated 
into  the  model,  but  in  the  author's  opinion,  such  a  development  should  be 
accomplished  only  after  some  time  series  data  for  the  concentration  levels  are 
made  available  for  Investigation. 

In  what  follows,  an  optimal  policy  will  be  derived,  that  is,  rules  will 
be  stated  such  that  for  every  state  of  the  process,  one  of  the  above  three 
decisions  will  be  chosen  and  the  expected  cost  will  be  a  minimum.  The  rules 
will  be  applied  to  some  specific  numerical  examples. 

MARKOV  DECISION  PROCESSES 

To  illustrate  the  concepts  Involved  in  a  DP  formulation,  assume  that  the 
values  of  the  concentration  level  are  classified  as  being  in  one  of  I  ordered 
intervals.  The  Ith  Interval  contains  only  concentration  levels  greater  than 
the  permissible  level,  and  the  first  interval  contains  the  lowest  values  of 
concentration  levels.  Let  the  concentration  level  which  could  be  measured 
during  time  Interval  n  be  classified  as  being  In  one  of  the  I  Intervals,  and 
let  Xn  take  the  value  of  the  Interval  in  which  the  concentration  level 
lies.  Assume  that  Xn  Is  an  Irreducible  aperiodic  Markov  chain  with  (known) 
stationary  transition  probability  matrix  P.  The  (1,  j)th  element  of  P  is 
denoted  by  pij,  and  the  elements  of  the  stationary  distribution  n  (a  row 
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vector)  by  wj.  By  choosing  the  concentration  level  to  be  a  Markov  chain, 
use  can  be  made  of  the  theory  of  Markov  Decision  Processes  (2;  6,  pp.  739- 
765). 


Consider  finding  a  sequence  of  decisions  which  are  optimal  when  the 
expected  cost  is  to  be  minimized  for  an  infinite  horizon  (6,  pp.  359-392). 
The  two  cost  criteria  In  common  use  are  the  expected  cost  per  unit  of  time  and 
the  present  value  of  the  expected  total  cost  over  the  infinite  horizon. 
When  choosing  the  latter,  the  equations  to  be  solved  are  (2,  p.  80;  6,  p. 
741): 


yi 


min  cj(j  +  o  £ 
deD  jeS 


0  <  a  <  1,  ieS 


(1) 


where  S  Is  the  set  of  all  possible  states  of  the  Markov  chain,  D  is  the  set  of 
all  possible  decisions,  y^  Is  the  present  value  of  the  total  expected  cost 
when  the  process  Is  In  state  1  and  an  optimal  policy  is  used,  c^  is  the 
expected  cost  during  one  Interval  of  time  when  the  process  is  in  state  i  and 
decision  d  is  made,  p(j|i,d)  is  the  probability  of  going  to  state  j  from  state 
i  when  decision  d  Is  made,  and  a  is  the  discount  factor.  When  the  process  is 
In  state  1,  decision  dj  will  always  be  made. 

While  Xn  denotes  the  level  of  the  concentration,  the  decision  maker  has 
knowledge  of  the  concentration  level  only  when  a  measurement  has  been  made. 
It  Is  thus  necessary  to  expand  the  set  of  states  S  to  be  the  pairs  (X,  Tn) 
where  Tn  Is  the  number  of  time  intervals,  measured  from  the  start  of  time 
Interval  n,  that  have  passed  since  a  measurement  was  made,  and  X  is  the 
concentration  level  when  the  last  measurement  was  made.  It  will  now  be 
assumed  that  the  decision  as  to  whether  or  not  to  make  a  measurement  Is  made 


at  the  beginning  of  the  time  Interval,  and  that  Xn  represents  the  concentra¬ 
tion  level  at  the  beginning  of  the  nth  time  Interval.  When  a  measurement  Is 
made,  the  value  observed  is  assumed  to  be  the  concentration  level  at  the  end 
of  the  time  Interval.  Thus,  If  Tn  «  1,  X  *  Xn.  The  maximum  value  Tn 
can  take  will  be  denoted  by  T.  It  Is  necessary  to  specify  a  finite  value  for 
T,  for  otherwise  (as  will  be  seen  below)  the  optimal  policy  would  be  never  to 
take  a  measurement,  and  thus  forever  to  avoid  a  penalty  cost  and  a  measurement 
cost.  It  Is  useful  to  think  of  T  as  the  value  of  an  interval  between  measure¬ 
ments  which  an  inspector  uses  when  he  comes  to  measure  the  employer's  com¬ 
pliance  with  regulations.  When  Tn  *  T,  only  decisions  2  and  3  will  be 
allowed.  The  augmented  state  (X,  Tn)  Is  still  a  Markov  chain,  and  expres¬ 
sions  for  the  elements  p  (j|l,d)  to  be  used  in  equation  1  are  given  In  Appen¬ 
dix  A.  It  Is  possible  to  gain  sane  Insight  from  the  special  case  where  a 
measurement  is  made  during  every  time  Interval  (T=l).  An  Important  parameter 
is  the  ratio  of  costs  given  by 

h  ■  (c3  -  ci)  /  C4  .  (2) 

The  following  Is  shown  In  Appendix  A  to  be  valid:  When  the  measurement  X  =  i , 
then  decision  2  Is  optimal  for  those  states  1  for  which 

Pll^h  1=1,  2,  ...,I 

and  otherwise  decision  3  Is  optimal.  Thus,  If  h  >  1,  decision  2  Is  always 
optimal  for  all  outcomes  1.  Similarly,  when  Tn  =  T,  and  X  =  1,  then 
decision  2  Is  optimal  for  those  states  1  for  which 


<  h 


I  , 
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1  =  1,  2,  .... 


and  otherwise  decision  3  Is  optimal,  where  pjpls  the  probability  that  Xn 
goes  from  state  i  to  state  I  In  T  Intervals.  After  having  determined  the 
optimal  decisions,  equation  1  can  be  used  to  compute  the  costs. 

As  T  Increases,  pjj^  approaches  for  all  i.  Thus,  the  Information  gain¬ 
ed  from  the  measurement  becomes  useless  for  the  purpose  of  making  an  optimal 
decision.  This  result  indicates  that  If  the  Markov  chain  were  an  independent 
process.  It  Is  not  optimal  ever  to  make  a  measurement.  (This  conclusion  can 
also  be  extended  to  any  time  series  model  of  the  process  which  consists  of  the 
sum  of  a  deterministic  process  and  a  completely  random  process.)  Also,  as  T 
Increases,  It  Is  shown  In  Appendix  A  that  when  Tn  <  T,  decision  1  is  always 
optimal . 

It  can  be  seen  that  for  any  particular  P,  the  value  of  irj  may  be  larger 
than  allowed  by  regulations.  However,  since  a  decision  can  be  made  to  carry 
out  the  process  without  any  employees  present.  It  Is  still  possible  to  achieve 
satisfactory  values  for  the  probability  that  an  employee  will  be  exposed  to  a 
concentration  level  above  the  permissible  limit  (denoted  by  P (B ) ) .  This 
goal  may  have  to  be  achieved  at  a  higher  cost  than  would  be  the  case  if  such  a 
constraint  were  not  present. 

Some  examples,  solved  numerically  by  using  the  policy  iteration  algorithm 
(2,  6)  will  follow.  A  description  of  the  program  Is  given  In  Appendix  B.  In 
the  examples,  small  numbers  are  chosen  for  the  costs,  for  as  can  be  seen  in 
equation  2,  It  Is  the  ratio  of  costs  which  Is  Important. 

Example  1.  Let  C!  *  1,  c2  ■  .2,  c3  *  3.1,  c4  s  2,  a  ■  .98,  and  let 
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Using  equation  2,  h  ■  1.05,  and  thus  if  T  *  1,  decision  2  is  always  optimal. 
When  T  >  1,  the  solution  was  such  that  decision  2  is  always  optimal  when  Tn 
*  T,  and  decision  1  is  optimal  otherwise:  Table  1  shows  how  the  expected 
future  cost,  C,  and  the  probability  of  an  employee  exceeding  the  permissible 
concentration  level,  P(B),  changes  with  the  length  of  the  inspection  interval, 
T.  The  stationary  solution  of  equation  3  yields  113  =  .051.  If  the  maximum 
allowable  value  of  P(B)  were  set  to  be  equal  to  .05,  then  the  employer  would 
not  be  in  compliance  with  the  regulations.  By  using  the  nonoptimal  decision 
vector  d'  *  (2,  2,  3)  when  T  =  1,  it  can  be  shown  that  C  =  70.0,  and  P(B)  ■ 
.046.  Similarly,  when  T  *  2,  if  d'  »  (1,  1,  1,  2,  2,  3),  C  =  60.0  and  P(B)  = 
.049.  This  example  illustrates  how  the  constraint  P(B)  <  .05  increases  the 
cost. 

Example  2.  The  same  parameters  as  in  Example  1  are  used,  but  the  penalty 
cost  is  increased  to  C4  =  105.  By  using  equation  2,  h  *  .02,  and  since  pj3 
>  .02  for  i  ■  1,  2,  and  3,  when  T  *  1,  decision  3  is  always  optimal.  If  T  > 
1,  decision  3  is  always  optimal  when  Tn  =  T,  and  decision  1  is  optimal 
otherwise.  Thus,  when  T  =  1,  the  process  is  never  used  if  the  optimal  policy 
is  applied.  By  using  the  nonoptimal  policy  d'  »  (2,  3,  3),  it  can  be  shown 
that  the  cost  increased  to  C  *  207.0  and  P(B)  =  .024.  If  T  >  1,  then  P(B)  _< 
.05,  so  that  the  process  can  be  used.  Thus,  inspecting  less  often  makes  the 
process  "acceptable". 

Example  3.  Changing  the  penalty  cost  to  04  *  17.5,  when  T  =  1,  causes 
p13  and  P33  >  h,  but  p23  <  h.  Thus,  low  or  high  values  of  the  measured 
concentration  level  yield  decision  2  as  optimal,  but  an  intermediate  value 
leads  to  decision  3.  Also  P{B)  =  .029  <  .05.  Increasing  T  to  values  greater 
than  1  leads  to  decision  2  being  optimal  when  Tn  =  T,  and  decision  1  is 


optimal  otherwise.  However,  P(B)  =  .051  >  .05  for  all  T  >  1.  Using  a 
nonoptimal  decision  equal  to  3  for  state  (3,2)  increases  the  cost  from  77.5  to 
77.7,  and  reduces  P(B)  to  .049. 

TABLE  1.  EXPECTED  FUTURE  COST  AND  PROBABILITY  OF  EMPLOYEE  EXCEEDING 
PERMISSIBLE  CONCENTRATION  LEVEL  VERSUS  INSPECTION  INTERVAL 


Example  1 

Example  2 

Example  3 

Example  4 

1_ 

C 

P(B) 

C 

P(B) 

C 

_ PCB) _ 

C 

P(B) 

1 

65.1 

.051 

165.0 

.000 

101.3 

.029 

4.46 

.009 

2 

57.6 

.051 

107.5 

.026 

77.5 

.051 

4.10 

.231 

3 

55.0 

.051 

88.3 

.034 

68.3 

.051 

3.97 

.178 

4 

53.8 

.051 

78.8 

.039 

63.7 

.051 

3.87 

.220 

5 

53.0 

.051 

73.0 

.041 

61. U 

.051 

3.81 

.218 

6 

52.5 

.051 

69.2 

.043 

59.2 

.051 

3.76 

.227 

Example  4. 

Let  Ci  = 

=  .7, 

c2  *  .01,  c3  * 

1 ,  c., 

=  2,  a  =  . 

8  and  let 

.00 

.49 

.49 

• 

02 

.30 

.02 

.30 

• 

38 

P  = 

.20 

.20 

.02 

• 

58 

.18 

.40 

.40 

• 

02 

In 

this 

example 

,  for  T 

>  1 

and  Tn 

<  T,  decision  1 

is  not  always  optimal. 

When  Tn 

<  T  and 

decision 

1  is 

not  always  optimal. 

the 

matrix 

p(j  |i.d) 

con- 

tains  transient 

states. 

In  this  example,  when  T  equals  two. 

d'  =  (2,  1, 

1,  2, 

3, 

3,  2, 

3),  and  augmented  states  5 

and  8  cannot 

be 

reached 

because  measure- 

ments  are  made 

when  the 

system  is 

in  the 

augmented 

states 

1  and  4. 

When 

evaluating  C  and  P(B)  in  Table  1,  the  transient  states  were  eliminated. 
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Notice  that  for  the  decision  vector  shown  above,  a  measurement  of  concen¬ 
tration  at  the  lowest  level  yields  a  decision  to  make  another  measurement, 
while  &  measurement  at  either  of  the  next  two  highest  levels  yields  a  decision 


not  to  make  another  measurement. 

DP  USING  ARMA  PROCESSES 

Instead  of  treating  the  concentration  level  as  a  discrete  random  vari¬ 
able,  we  can  consider  it  to  be  a  continuous  random  variable  and  use  an 
autoregressive-moving  average  (ARMA)  model  (1).  Such  models  usually  have  the 
advantage  of  containing  less  parameters  to  estimate  than  a  Markov  chain 
model.  The  DP  equations  to  be  solved  are  (4): 

y(x,t)  =  min  (  C(x  t)  d  +  a  Z,  fy(z,t')  f(z,t'  I  (x,t),d)  dz  }  (4) 

deD  t  ' 

where  the  augmented  state  is  again  defined  to  be  (X,  Tn).  The  variable  Xn 
takes  values  x  on  the  real  line,  and  when  Tn  =  t,  X  =  Xn_t+j.  In  equation 
4  f(z,t'  |  (x,t),d)  is  the  conditional  probability  density  of  (X,  Tn),  given 
that  (X  =  x,  Tn  *  t)  and  decision  d  was  made  at  time  n.  All  other  variables 
are  as  defined  in  equation  1.  For  the  autoregressive  model  of  order  1, 

defined  as 

Xn  -  u  =  <J>(Xn-l  -  u)  +  an  ,  |  ♦  |  <  1  (5) 

where  an  is  a  normal  white  noise  process  with  variance  oa2,  expressions 
for  f  are  derived  in  Appendix  C.  Note  that  the  process  defined  In  equation  5 
is  also  a  Markov  process  (5,  p.  11).  When  T  =  1,  the  following  is  shown  to  be 
valid  in  Appendix  C: 


Let  h  be  as  defined  in  equation  2;  let  L  be  the  permissible  exposure 
limit;  and  let  z(h)  be  the  solution  of 

P  (Z  >  z(h))  =  h  o  <  h  <  1  (6) 

where  Z  is  a  normally  distributed  random  variable  with  mean  equal  to  zero  and 
variance  equal  to  1.  Then  when  the  measurement  X  =  x,  decision  2  is  optimal 
for  those  states  x  for  which 

x*<  L  -  u  (1  -  ♦)  -  z(h)  oa  ,  (7) 

and  otherwise  decision  3  is  optimal.  Thus,  if  h  >  1,  decision  2  is  optimal 
for  all  outcomes  x. 

Further  results  for  ARMA  processes  can  be  obtained,  but  one  difficulty  to 
be  dealt  with  is  the  problem  of  forecasting  when  there  are  missing  observa¬ 
tions  in  the  time  series  when  ARMA  processes  more  complex  than  the  above 
example  are  used. 

CONCLUSION 

The  sampling  strategy  proposed  by  NIOSH  could  lead  to  sequences  of  con¬ 
secutive  measurements  being  made,  with  the  length  of  any  particular  sequence 
being  determined  by  the  outcomes  of  the  previous  measurements.  The  above 
properties  of  the  sampling  strategy  are  also  true  for  the  sampling  strategies 
derived  in  this  report,  but  with  the  following  differences: 

1.  A  maximum  interval  of  time  between  samples  must  be  specified, 

2.  It  Is  not  always  true  that  a  high  concentration  level  leads  to  a 
decision  to  make  a  measurement,  and  a  low  concentration  level  leads 
to  a  decision  not  to  make  a  measurement.  The  opposite  can  be  the 
optimal  strategy. 


3.  As  the  specified  maximum  Interval  between  samples  Increases,  the 
optimal  decision  Is  to  never  make  a  measurement  unless  required  to. 

4.  A  minimum  cost  policy  is  being  invoked. 

The  applicability  of  DP  is  not  limited  to  the  cost  structure  and  set  of 
decisions  used  in  this  report  that  were  chosen  to  illustrate  in  a  simple  but 
meaningful  way  the  concepts  involved  in  applying  the  technique  of  DP  to  the 
problem  of  developing  an  optimal  sampling  strategy. 

To  extend  the  DP  approach  to  include  the  problem  of  parameter  estimation, 
it  would  be  necessary  to  derive  expressions  for  the  joint  probability 
distribution  of  the  parameters  as  a  function  of  the  number  of  measurements, 
and  hence  it  would  be  necessary  to  use  a  finite  horizon  DP  formulation.  The 
problem  of  dimensionality  could  then  cause  computational  difficulties  (4,  p. 
65;  6). 

It  is  possible  to  use  ARMA  models  instead  of  Markov  chains  in  a  manner 
similar  to  that  discussed  in  this  report,  but  consideration  must  be  given  to 
the  problem  of  missing  observations  when  dealing  with  other  than  the  simple 
autoregressive  model. 
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APPENDIX  A 


DERIVATION  OF  COSTS  AND  TRANSITION  PROBABILITIES 
In  this  Appendix,  explicit  expressions  for  use  In  equation  1  are  derived, 
and  some  solutions  are  obtained. 

Let  the  augmented  states  Zn  =  (X,  Tn)  be  ordered  lexicographically 


(  (1,1),  (2,1),  ....  (1,2),  (2,2), 

...,  (I,T)  ),  where  (1,1)  is 

denoted  as 

state  one,  and  (I,  T)  as  state  IT. 

The  costs  to  be  used  in  equation  1  are 

cii  =  ci 

1  <  1  <  I  (T-l)  , 

(Al) 

c12  =  ci+  c2  +  c4 

j  I  +  1  <  i  <  (j  +  1)1  , 

and 

0  <  j  <  T  -  1  , 

(A2) 

c13  s  c2  +  C3 

1  <  1  <  I  T 

(A3) 

where  pj^  Is  the  probability  of  a 

transition  of  Xn  from  state  1 

to  state  k 

in  j  time  Intervals.  (Note  that 

c^i  is  undefined  if  T  =  1 

or  If  1  > 

I  (T-l)  ). 

If  decision  one  is  made,  then 

the  state  (X,  t)  goes  to  (X, 

t+1)  with 

probability  one.  Thus, 

1  if  j  * 

1  +  I,  1  <  i  <  I (T-l) , 

P(j  (  1»  1)  *  undefined 

I  (T-l)  <  1  <  I  T 

(A4) 

0 

otherwise  . 

If  decisions  two  or  three  are  made,  the  state  (Xn„t+l,  t)  goes  to  (Xn+it  t) 
in  t  time  intervals.  Thus, 


(A5) 


Equation  1  can  be  solved  using  the  value  Iteration  or  the  policy 

Iteration  algorithms  (2,  6),  and  a  computer  program  was  written  using  the 
policy  Iteration  algorithm.  (See  Appendix  B  for  the  listing.) 

Consider  the  special  case  where  a  measurement  Is  made  during  every 

interval  of  time  (T  *  1).  Then  only  decisions  2  or  3  are  allowed.  Since 
equation  A5  applies  whichever  decision  Is  made,  equations  A2  and  A3  show  that 
decision  2  Is  the  optimal  policy  If  pjj  <  h  for  all  outcomes  X  *  1,  1  ■  1, 
2,  ....  I  (see  equation  2),  and  decision  3  Is  optimal  otherwise.  For  T  >  1, 
when  Tn  ■  T  and  the  result  of  the  last  measurement  was  X  «  1,  decision  two 

Is  optimal  If  P^P  <  h,  and  decision  3  Is  optimal  otherwise.  If  the  optimal 

decisions  when  in  the  augmented  state  j  are  found  to  be 


1  1  <  j  <  (T-l)  I 

dj  *  (A6) 

2  or  3  (T-l)  I  +  1  <  j  <  IT  , 

then  the  resulting  equations  for  the  minimum  costs  have  a  simple  form  which 
can  be  found  by  using  equations  A1  through  A6  In 

y\  *  ciq  +  a  j|s  p  (j|l,  d)y j  0  <  a  <  1,  leS  .  (A7) 
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Letting  u  be  a  column  vector  with  components 


ui  *  yi+(T-l)I 

and  v  a  column  vector  with  components 


1  <  1  <  I  (A8) 


vi 


Cl  a(l-cJ_1) 
1-a 


+  C2  +  min  {  ci  +  C4  p 


1  <  1  _<  T  ,  (A9) 


the  solution  of  equation  A7  Is  given  by 

u  =  (I  -  aT  PT)  -1  v  (A10) 

where  I  Is  the  identity  matrix.  The  other  costs  are  given  by 

ci  fl-«  T-k-1) 

yi  =  oJ-k-l  Ui.kI  +  1  v  _ _  l+kl  <  1  <  (k-l)I,  0  <  k  <  T-2, 

1  “Q 

1  <  1  <  (T— 1 )  I  .  (All) 

If  it  is  assumed  a  priori  that  equation  A6  is  the  solution  to  equation  1, 
then  equations  A8  though  All  can  be  used  on  the  right-hand  side  of  equation 
1.  If  the  solution  of  equation  1  Is  Identical  to  that  given  by  equations  A8 
through  All,  then  equation  A6  Is  the  solution  (5,  p.  128).  It  can  be  shown 
that  equation  A6  Is  the  solution  If  all  of  the  following  Inequalities  are 
satisfied: 


o'"'  (I  -  a'P')(I  -  crpT)-l  min  cil_  +  C4  pT-lpj}  < 

min  {cjl^;  cil_  +  C4P1  _1Pj }  +  c 2  - 1  1  *  1,  2 . T-l 

l-oT  ” 


(A12) 


where  1  Is  a  column  vector  all  of  whose  I  elements  equal  one.  Pj  Is  the  Ith 
column  of  P,  and  P°  Is  the  Identity  matrix.  Equation  A12  Is  satisfied  If 

c3i  <  cii  +  C4  P1"1  Pi 

for  all  1  ■  1,  2,  3 . T-l  or  vice  versa.  Other  combinations  of  parameters 

for  which  equation  A12  Is  satisfied  are  more  difficult  to  find.  As  T  becomes 
large,  equation  A12  will  be  satisfied,  and  equation  A6  Is  the  solution. 

Let  the  probability  transition  matrix  of  the  augmented  states  Zn  under 
the  optimal  policy  be  denoted  by  P*,  and  let  the  stationary  solution  for  P*  be 
denoted  by  the  row  vector  it*.  Then  the  present  value  of  the  total  expected 
cost  using  an  optimal  policy  Is  given  by 


yi  *1’ 


(A13) 


The  probability  that  the  process  exceeds  the  permissible  exposure  limit 
Is  given  by  iq.  However,  since  It  Is  possible  not  to  have  employees  present 
when  the  process  exceeds  the  permissible  exposure  limit,  the  probability  that 
the  process  exceeds  the  permissible  exposure  limit  when  employees  are  present 
Is  given  by 


T-l  (j+l)I 
P  (B)  «  Z  z 

j»o  1*l+jl 


(j+U 

P1-JM 


(A14) 


6 


s  *• 


a» 

i 

•  •  * 


where  F  Is  the  set  of  augmented  states  which  does  not  yield  the  optimal 
decision  which  Is  equal  to  3. 

In  the  special  case  given  In  equation  A6,  use  of  equations  A4  and  A5 
yields 


n*  =  (  n,  n,  ....  n  )  /  T  .  (A15) 

Finally,  we  note  that  as  T  becomes  large,  1)  u  approaches  v,  where  the 
components  of  v  approach 


11m  v.  a  _L  +  c?  +  min  {ci  +  Ci  »>;  col 

T-k«>  1-a 


(A16) 


and  2) 


11m  PT 
T+*» 


n' 

n' 


(A17) 
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APPENDIX  B 


DESCRIPTION  OF  THE  POLICY  ITERATION  PROGRAM 

This  Is  a  description  of  the  capabilities  of  the  program  entitled  "Policy 
Iteration,"  a  listing  of  which  appears  below.  The  program  computes  the  opti¬ 
mal  policy  and  the  present  value  of  the  total  expected  costs  yi  by  solving 
equation  1  using  the  policy  Iteration  algorithm  (2,  6).  The  probability 
matrices  p ( j J 1 , d )  are  given  by  equations  A4  and  A5,  and  the  expected  costs 
cid  by  equations  A1  through  A3.  In  addition,  the  stationary  solution  of  P 
and  of  P*,  and  the  present  value  of  the  total  expected  cost  C,  given  by  equa¬ 
tion  All  and  P(B),  given  by  equation  A12,  are  also  computed.  If  the  Markov 
chain  defined  by  P*  Is  not  irreducible,  an  Irreducible  chain  is  found  by  elim¬ 
inating  the  transient  states,  and  C*  and  P(B)  are  computed  for  this  irreduc¬ 
ible  chain. 

The  output  consists  of  a  listing  of  the  input  data  clt  c2,  c3,  c4,  a,  P 
and  Its  dimensions,  T,  and  the  initial  policy  vector.  The  computed  results 
displayed  are  p  ( j  J 1  ,d)  (if  desired),  c^,  the  optimal  policy  vector,  P*,  n, 
n*,  C,  P(B),  and  the  number  of  iterations  needed  for  converging  ( ITN ) .  A  sam¬ 
ple  output  appears  In  this  Appendix. 

Table  B1  lists  the  order  of  the  data  necessary  to  run  the  program. 


TABLE  Bl. 

ORDER  OF  INPUT  DATA  FOR  THE  PROGRAM 

Order 

Symbol  and  Description 

Format 

1. 

1ST; 

Number  of  states  in  Markov  chain 
(1ST  -  0  stops  execution.) 

12 

2. 

IPRINT; 

Print  option  for  matrices  p(j|i,d) 
IPRINT  *  0,  no  print.  1 

IPRINT  =  1,  print. 

12 

3. 

COST (4); 

Costs  (clt  c2,  c3,  c4). 

4E  8.0 

4. 

ALPHA; 

discount  factor. 

E  8.0 

5. 

p(i»J); 

Transition  matrix  P,  of  dimension 
(1ST)  x  (1ST),  one  row  per  record, 
1ST  records. 

(IST)E  8.0 

6. 

TINT; 

Maximum  value  of  T.  If  TINT  »  0, 
program  reads  new  1ST,  IPRINT,  etc. 

12 

7. 

DEC(I); 

Initial  policy  vector  of  length 
(1ST)  x  (TINT)  =  IT. 

(IT)  12 

The  source  code  is  written  for  use  on  a  VAX  11/780.  The  Input  file  for  the 
example  problem  Is: 

02 

01 


I.DO 

.5D0  1.D0 

10.00 

95D0 

8500 

.1500 

8000 

.2000 

02 


03030303 


o  o  o  o 


C  FILE  PITD.FOR 

C  —  POLICY  ITERATION^. L. SWEET, AUGUST  1980 
IMPLICIT  REAL*8(A-H,0-Z) 

INTEGER  TINT, DEC, TDEC 

DIMENSION  COST (4) ,P (30,30) , DEC (30) ,CM(30,3) ,R(30,30), 

-  PK{30,30),  NC ( 30 ) , PKT ( 30 , 30 ) , PM ( 3 , 30 , 30 ) , Y ( 30 ) , WYE ( 30 ) , 

-  TDEC(30) ,PI (30) 

0PEN(UNIT=09,NAME=,PITDIN.DAT' ,TYPE='OLD' ,DISP='KEEP' ) 

OPEN (UNIT=10,NAME=* PI TDOUT.DAT' .TYPE^'NEW* ,DISP='KEEP' ) 

C  —  READ  INPUT 

C  —  READ  NO  OF  STATES  IN  MARKOV  CHAIN 
1000  READ(09,100)  1ST 

100  FORMAT (301 2) 

IF(IST.EQ.O)  GO  TO  999 
READ(09,100)  IPRINT 

C  —  READ  COSTS 

READ(09,101 )COST 

101  FORMAT (10D8.0) 

C  —  READ  DISCOUNT  FACTOR 
READ (09, 101 ) ALPHA 

C  —  READ  TRANSITION  MATRIX  FOR  MARKOV  CHAIN 
DO  1  1=1, 1ST 

1  READ (09, 101 ) (P(I ,J ) ,J=1 ,IST) 

C  —  READ  MAXIMUM  INTERVAL  BETWEEN  MEASUREMENTS 
2000  READ(09,100)  TINT 

IF(TINT.EQ.O)  GO  TO  1000 
IT=IST*TINT 

C  —  READ  INITIAL  POLICY  COLUMN  VECTOR, LENGTH  IT 
READ(09,100)  (DEC ( I ),I=1,IT) 

C  —  PRINT  INPUT 

WRITE (10, 200) I  ST, TINT 

200  F0RMAT(1H1 ,4X,29HN0  OF  STATES  IN  MARKOV  CHAIN=,I4/ 

5X.38HMAXIMUM  INTERVAL  BETWEEN  MEASUREMENTS* ,14) 
WRITE(10,201 )COST 

201  FORMAT (1H0,4X,19HC0ST  OF  PRODUCTION=,F7.2/ 

5X.20HC0ST  OF  MEASUREMENT  = ,F7 . 2/ 

5X.28HC0ST  WHEN  NOT  IN  PRODUCTION-, F7. 2/ 

5X.38HC0ST  OF  EXCEEDING  MAX  POLLUTION  LEVEL=,F7.2) 
WRITE(10,202) 

202  F0RMAT(1H0,5X,34HTRANSITI0N  MATRIX  FOR  MARKOV  CHAIN,/) 

DO  2  1=1, 1ST 

2  WRITE(10,203)I , (P( I ,J) ,J=1 ,IST) 

203  F0RMAT(1H0,I4,6G12.4/(1H  ,4X,6G12.4)) 

WRITE(10,206) 

206  FORMAT ( 1H0 , 4X , 36HDEC I S ION  1  -PLANT  RUN  FOR  PRODUCTION/ 
-5X,46HDECISI0N  2  -PLANT  RUN  FOR  BOTH  PRODUCTION  AND  , 
-11HMEASUREMENT/ 

-5X,46HDECISI0N  3  -PLANT  RUN  FOR  MEASUREMENT  ONLY, NO  , 
-10HPR0DUCTI0N) 

—  COMPUTE  R  VECTORS  AND  POWERS  OF  P 

—  THERE  ARE  TINT  R  VECTORS  OF  DIMENSION  1ST  EACH 

—  IN  R ( I , J ) , I  IS  VECTOR  NO, 1=1, TINT  AND  J  IS  R0W,J»1,IST 

—  PK  CONTAINS  POWERS  OF  P 


DO  4  I*  1,IST 
DO  4  J«  l.IST 
PK(I,J)«O.DO 
IF(I.EQ.J)  PK(I,J)«l.DO 

4  CONTINUE 

C 12-COST (l)+COST (2) 

C4—C0ST (4 ) 

IC2-0 

DO  20  KLOOP-l.TINT 
DO  6  1*1 .1ST 
SUM*0 .DO 
DO  5  J«1,IST 

5  SUM*SUM+PK(I ,J)*P(J,IST) 

6  R(KL00P,I)«C12+C4*SUM 
IC1-IC2+1 
IC2=KL00P*IST 

DO  7  1*1, 1ST 

7  CM(IC1+1-1,2)=R{KL00P,1) 

DO  10  1=1, 1ST 

DO  9  J*1,IST 
SUM=O.DO 
D08  K=1,IST 

8  SUM=SUM+P(I,K)*PK(K,J) 

9  PKT(I,J)=SUM 

10  CONTINUE 

DO  11  1=1, 1ST 
DO  11  J-l.IST 
Z*PKT(I,J) 

PK(I,J)=Z 

PM(2,IC1+I-1,J)=Z 

11  PM(3,IC1+I-1,J)*Z 
20  CONTINUE 

—  STORE  REST  OF  COST  MATRIX, CM 
C23=C0ST(2)+C0ST(3) 
IT1-IT-IST 

IF(ITl.EQ.O)  GO  TO  251 
DO  25  1=1, IT1 
CM(I,3)«C23 
25  CM(I,1)=C0ST(1) 

251  IT2-IT1+1 

DO  26  I-IT2.IT 


STORE  REST  OF  P2,P3 
IT3-IST+1 

IF( IT3.GT. IT)  GO  TO 
DO  27  1=1, IT 
DO  27  J-IT3.IT 
2, 

3, 


MAKE  UP  PI 
DO  28  1=1, IT 
DO  28  J=1,IT 


28  PM(1,I,J)=0.D0 
IT2=IT-IST 

IF(IT2.EQ.O)  GO  TO  30 
DO  29  1=1, IT2 

29  PM(1,I,IST+I )=1.D0 

C  —  PRINT  PI,  IF  IPRINT  =  1 

30  IF ( IPRINT.EQ.O)  GO  TO  421 
WRITE(10,212) 

212  F0RMAT(1H0,5X,29HTRANSITI0N  MATRIX,  DECISION  1) 

DO  41  1=1, IT 

41  WRITE(10,203)I ,(PM(1,I ,J),J=1,IT) 

C  —  PRINT  P2  AND  P3,  IF  IPRINT  =  1 

WRITE (10,214) 

214  FORMAT (1H0.5X.36HTRANS IT ION  MATRIX,  DECISIONS  2  AND  3) 
DO  42  1=1, IT 

42  WRITE(10, 203)1, (PM(2,I,J),J=1, IT) 

C  —  PRINT  COST  MATRIX 

421  WRITE(10,215) 

215  F0RMAT(1H0,4X,11HC0ST  MATRIX,/) 

DO  43  1=1, IT 

43  WRITE(10,210)(CM(I ,0),J=1,3) 

210  FORMAT (1H  .4X.6G12.4) 

C  —  PRINT  INITIAL  POLICY  VECTOR  AND  DISCOUNT  FACTOR 
WRITE(10,204) 

204  F0RMAT(1H0,4X,21HINITIAL  POLICY  VECTOR,/) 

WRITE( 10,209 ) (DEC(I ) ,1=1 ,IT) 

209  FORMAT (1H0,10X,30I3) 

WRITE(10,205) ALPHA 

205  F0RMAT(1H0,4X,6HALPHA=,F6.4) 

C  ITERATION  LOOP 

DO  70  ITN=1 ,20 

C  —  COMPUTE  COEFFICIENT  MATRIX  FOR  ITERATION 
DO  51  1=1, IT 
DO  51  J=1 ,IT 
Z=-ALPHA*PM (DEC ( I ) , I , J ) 

IF(I.EQ.J)  Z=Z+1.D0 

51  PKT(I ,J)=Z 
C  —  INVERT  PKT 

CALL  GJR(PKT,IT,30,1.D-10,MFLG) 

IF (MFLG.NE.O)  GO  TO  99 
C  —  FIND  Y(I),I=1,IT 
DO  53  1=1, IT 
SUM=O.DO 
DO  52  J*1 ,IT 

52  SUM=SUM+PKT ( I , J )*CM(J ,DEC (J ) ) 

53  Y(I)=SUM 

C  —  FIND  WYE (I), 1=1, IT 
DO  56  1=1, IT 
TW-1.D06 
TDEC(I)«DEC(I) 

00  55  N-1,3 
SUM-0. DO 
DO  54  J-l.IT 


22 


54  SUM=SUM+PM(N, I ,J  )*Y (J ) 

TWYE=ALPHA*SUM+CM(I ,N) 

IF(TWYE.GE.TW)  GO  TO  55 
TW=TWYE 

DEC ( I )=N 

55  CONTINUE 

56  WYE ( I )=TW 
ISUM=0 

DO  57  1=1, IT 

57  I SUM= I SUM+ I ABS ( DEC ( I ) -TDEC ( I ) ) 

IF(ISUM.EQ.O)  GO  TO  71 

70  CONTINUE 

C  —  END  ITERATION  LOOP 

71  WR I TE ( 10,216)1 TN 

216  F0RMAT(1H0,4X,14H0PTIMUM  POLICY, 5X,4HITN=,I2//5X,5HSTATE, 
4X,8HDECISI0N,3X,10HMIN.  COSTS) 

DO  58  1=1,  IT 
NC ( I )=I 

58  WRITE (10, 219)1, DEC ( I ),WYE(I ) 

219  F0RMAT(6X,I2,8X,I2,8X,F7.3) 

C  —  OPTIMUM  POLICY  TRANSITION  MATRIX 
I TT  =  IT 

DO  61  1=1, ITT 
DO  61  J=1 , ITT 
PKT ( I ,J )=PM(DEC(I ) ,1 ,J) 

61  CONTINUE 

611  WRITE(10,220) 

220  F0RMAT(1H0,4X,32H0PTIMUM  POLICY  TRANSITION  MATRIX) 

DO  62  1=1, ITT 

62  WRITE (10,203) I , (PKT(I ,J ),J=1 ,ITT) 

C  —  FIND  STATIONARY  SOLUTION  OF  OPTIMUM  POLICY  MATRIX 
DO  63  1=1, ITT 
DO  63  J=1 ,ITT 
R ( I ,J )=PKT (J ,  I ) 

IF(I.EQ.J)  R ( I , I )=R(I ,1 )-l.DO 

63  CONTINUE 
IT1-ITT-1 

DO  64  1=1, IT1 

64  Y(I )=-R(I , ITT ) 

CALL  GJR(R,IT1,30,1.D-10,MFLG) 

IF(MFLG.NE.O)  GO  TO  680 
DO  66  1=1, IT1 
SUM=O.DO 
DO  65  J-l.ITl 

65  SUM=SUM+R ( I , J )*Y ( J ) 

66  PI (I )=SUM 

PI ( ITT )=1 .DO 
SUM=O.DO 
DO  67  1=1, ITT 

67  SUM=SUM+PI (I ) 

DO  68  1=1, ITT 

68  PI (I )=PI  ( I )/SUM 
WRITE (10,221 ) 
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r»  v  * 

^"S 


221  F0RMAT(1H0,4X,35HSTATI0NARY  SOLUTION .OPTIMUM  POLICY  , 

-  6HMATRIX,/) 

WRITE(10,210)(PI(I),I=1,ITT) 

GO  TO  681 

C  —  MAKE  UP  REDUCED  OPTIMUM  POLICY  MATRIX 

680  WRITE (10,217) 

640  IT0G=0 

DO  642  J-l.IT 

I F ( NC (J).EQ.O)  GO  TO  642 

SUM=O.DO 

DO  641  1=1,  IT 

I F ( NC ( I ) .EQ.O)  GO  TO  641 

SUM  =  SUM+PKT(I,J) 

641  CONTINUE 

IF(SUM.NE.O.DO)  GO  TO  642 

IT0G=1 

NC ( J ) =0 

642  CONTINUE 
IF(ITOG.NE.O)  GO  TO  640 
JN=0 

DO  644  J-l.IT 

IF(NC(J ) .EQ.O)  GO  TO  644 

JN=JN+1 

IN=0 

DO  643  1=1, IT 

I F ( NC ( I ) .EQ.O)  GO  TO  643 

IN=IN+1 

PKT ( I N , JN )=PKT ( I , J ) 

643  CONTINUE 

644  CONTINUE 
ITT=JN 

GO  TO  611 

C  —  MINIMUM  EXPECTED  COST 

681  Z=O.DO 
IN=0 

DO  69  1=1, IT 

I F ( NC ( I ).EQ.O)  GO  TO  69 

IN=IN+1 

Z=Z+WYE(I)*PI(IN) 

69  CONTINUE 

WRITE(10,222)Z 

222  F0RMAT(1H0,4X,22HMINIMUM  EXPECTED  C0ST=,F7.3) 

C  —  PROBABILITY  OF  WORKER  BEING  EXPOSED  TO  POLLUTION  LEVELS 
C  —  ABOVE  MAXIMUM  PERMISSIBLE 
DO  82  1=1, 1ST 
DO  82  J=1,IST 
PK ( I ,J)=O.DO 
IF(I.EQ.J)  PK ( I ,J)=1.D0 
82  CONTINUE 
B=O.DO 
IN=0 

DO  90  JT=1 .TINT 
DO  86  1=1, 1ST 
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J 


00  85  J=1,IST 
SUM=0.D0 
DO  84  K=1,IST 

84  SUM=SUM+P(I ,K)*PK(K,J) 

85  PKT(I,J)=SUM 

86  CONTINUE 

00  87  1=1, 1  ST 
DO  87  J=1 , 1ST 

87  PK(I,J)=PKT(I,J) 

SUM=0. 

I1=1+(JT-1)*IST 

I2=JT*IST 

DO  89  1=11,12 

I F (DEC ( I ) .EQ.3)  GO  TO  89 

I F ( NC ( I ).EQ.O)  GO  TO  89 

IN=IN+1 

I3=I-(JT-1)*IST 
SUM=SUM+PI (IN) *PK ( 1 3  a I  ST ) 

89  CONTINUE 
B=B+SUM 

90  CONTINUE 
WRITE(10,224)B 

24  F0RMAT(1H0,4X,21HEXP0SURE  PROBABILITY*, F6. 3) 

—  STATIONARY  SOLUTION  OF  TRANSITION  MATRIX 
DO  73  1=1, 1ST 
DO  73  J=1 .1ST 
R(I,J)=P(J,I) 

IF(I.EQ.J)  R ( I , I )=R ( I » I )-l .DO 
73  CONTINUE 
I Tl= I  ST— 1 

DO  74  1=1, IT1 

74  Y ( I )=-R(I,IST) 

CALL  GJR(R,IT1,30,1.D-10,MFLG) 

IF(MFLG.NE.O)  GO  TO  99 
DO  76  1=1, IT1 
SUM=O.DO 
DO  75  J=1,IT1 

75  SUM=SUM+R ( I , J ) *Y ( J ) 

76  PI ( I )=SUM 

PI (IST)=1.D0 
SUM=O.DO 
DO  77  1=1, 1ST 

77  SUM=SUM+PI (I ) 

DO  78  1=1, 1ST 

78  PI (I )=PI (I )/SUM 
WRITE(10,223) 

223  F0RMAT(1H0.4X,37HSTATI0NARY  SOLUTION.TRANSITION  MATRIX,/) 
WRITE (10,2ii) (PI (1), 1=1, 1ST) 

GO  TO  2000 
999  STOP 

99  WRITE(10,217) 

217  F0RMAT(1H0,5X,15HSINGULAR  MATRIX) 

STOP 

END 
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Q  ***** 

C  *GJR* 

C  ***** 

SUBROUTINE  GJR(A,N,MM,EPS,MFLG) 

IMPLICIT  REAL *8  (A-H.O-Z)  PC20805 

C  GAUSS-JORDAN-RUTISHAUSER  MATRIX  INVERSION  WITH  DOUBLE  PIVOTING.  PC20806 

DIMENSION  B ( 30 ) ,C (30) , IP (30 ) , IQ (30) , A(MM,MM ) 

MFLG=0 

DO  55  K-l.N  PC20810 

C  DETERMINATION  OF  THE  PIVOT  ELEMENT  PC208I1 

ABSPIV-O.DO  PC20812 

DO  5  I.-K,N  PC20813 

00  5  J=K,N  PC20814 

ABSAIJ=DABS(A(I ,J ) )  PC20815 

IF  (ABSPIV.GT.ABSAIJ)  GO  TO  5  PC20816 

PIVOT=A(I ,0)  PC20817 

ABSPIV*ABSAIJ  PC20818 

IP(K)=I  PC20819 

IQ(K)=J  PC20820 

5  CONTINUE  PC 20821 

IF  (ABSPIV.GT.EPS)  GO  TO  15  PC20822 

MFLG=1 

RETURN  PC20826 

C  EXCHANGE  OF  THE  PIVOTAL  ROW  WITH  THE  KTH  ROW  PC20827 

15  L=IP(K) 

IF(L.EQ.K)  GO  TO  25 

DO  20  J*1,N  PC20829 

Z=A(L,J )  PC20831 

A(L,J)=A(K,J)  PC20832 

20  A(K,J )=Z  PC20833 

C  EXCHANGE  OF  THE  PIVOTAL  COLUMN  WITH  THE  KTH  COLUMN  PC20834 

25  L=IQ(K)  PC20837 

IF(L.EQ.K)  GO  TO  35 

DO  30  1=1, N  PC20836 

Z=A(I ,L)  PC20838 

A(I ,L)=A(I ,K)  PC 20839 

30  A(I ,K)=Z  PC20840 

C  JORDAN  STEP  PC20841 

35  DO  50  J-l.N  PC20842 

IF  (J.EQ.K)  GO  TO  40  PC20843 

B(J)=-A(K,J)/PIVOT  PC20844 

C(J)=A(J,K)  PC 20845 

GO  TO  45  PC20846 

40  B(J)=1.DO/PIVOT 

C(J)=1.D0  PC20848 

45  A(K,J)*O.DO  PC 20849 

50  A(J,K)=O.DO  PC20850 

00  55  I-l.N  PC20851 

DO  55  J»1,N  PC20852 

55  A(I,J)»A(I,J)+C(I)*B(J)  PC20853 

C  REORDERING  THE  MATRIX  PC20854 

DO  75  M*1 ,N  PC20855 
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NO  OF  STATES  IN  HARKOV  CHAIN-  2 
MAXIMUM  INTERVAL  BETWEEN  MEASUREMENTS-  2 

COST  OF  PRODUCTION-  0.00 

COST  OF  MEASUREMENT-  0.50 

COST  WHEN  NOT  IN  PRODUCTION-  1.00 

COST  OF  EXCEEDING  MAX  POLLUTION  LEVEL-  10.00 

TRANSITION  MATRIX  FOR  MARKOV  CHAIN 


1  0.8500  0.1500 

2  0.8000  0.2000 

DECISION  1  -PLANT  RUN  FOR  PRODUCTION 

DECISION  2  -PLANT  RUN  FOR  BOTH  PRODUCTION  AND  MEASUREMENT 

DECISION  3  -PLANT  RUN  FOR  MEASUREMENT  ONLY, NO  PRODUCTION 

TRANSITION  MATRIX,  DECISION  1 


O.OOOOE+OO 

O.OOOOE+OO 

1.000 

O.OOOOE+OO 

’  O.OOOOE+OO 

O.OOOOE+OO 

O.OOOOE+OO 

1.000 

i  O.OOOOE+OO 

O.OOOOE+OO 

O.OOOOE+OO 

O.OOOOE+OO 

O.OOOOE+OO 

O.OOOOE+OO 

38 

O.OOOOE+OO 

O.OOOOE+OO 

TRANSITION  MATRIX,  DECISIONS  2  AND  3 

0.8500 

0.1500 

O.OOOOE+OO 

O.OOOOE+OO 

:  0.8000 

0.2000 

O.OOOOE+OO 

O.OOOOE+OO 

1  0.8425 

0.1575 

O.OOOOE+OO 

O.OOOOE+OO 

0.8400 

0.1600 

O.OOOOE+OO 

O.OOOOE+OO 

COST  MATRIX 

O.OOOOE+OO 

2.000 

1.500 

O.OOOOE+OO 

2.500 

1.500 

0.1000E+07 

2.075 

1.500 

0.1000E+07 

2.100 

1.500 

INITIAL  POLICY  VECTOR 

3  3  3  3 


ALPHA-0.9500 


APPENDIX  C 


DERIVATION  OF  COSTS  WHEN  USING  CONTINUOUS  VARIABLES 
In  this  Appendix,  explicit  expressions  for  use  in  equation  4  are 
derived.  The  costs  to  be  used  In  equation  4  are 

c(x,t),l  “  C1  1  <  t  <  T  (Cl) 

c(x,t),2  ■  C1  +  c2  +  C4P (A)  1  <  t  <  T  (C2) 

where 

P(A)  -  P(Xn+1>L  |  Xn.t+1  -  x,  Tn  -  t)  1  <  t  <  T  (C3) 

and 

C(x,t),3  ■  C2  +  c3  1  <  t  <  T  .  (C4) 

In  the  above,  c(Xjt),l  Is  undefined  if  T  *  1  or  If  t  *  T.  To  derive  an 
expression  for  P (A) ,  note  that  the  solution  of  equation  5,  subject  to  an 
Initial  value  of  Xn  equal  to  x,  is 

n 

Xn  «  x$n  +  u (1— ♦n )  +  £  at  n  >  1  .  (C5) 

1*1 

Since  the  an  are  normal,  Xn  is  normal  with  expectation  given  by  the  first 
two  terms  on  the  right-hand  side  of  equation  C5,  and  variance 

V(X„)  •  oa2  (l-*2n)  /  (1-42)  .  (C6) 


1. 


(C7) 


P  (A) 


P  (  Z  >  1  -  *♦*  -  ) 

V(Xt)l/2 


where  Z  Is  a  normal  (0,  1)  random  variable. 

If  T  *  1,  It  Is  only  necessary  to  compare  equations  C2  and  C4,  and  thus 
equations  6  and  7  follow,  since  decision  2  Is  optimal  If  P(A)  <  h. 


